Development of a deep learning‐based nomogram for predicting lymph node metastasis in cervical cancer: A multicenter study

Dear Editor, Cervical cancer is one of the most frequently diagnosed cancers in women and has a high mortality rate worldwide.1 Lymph node metastasis (LNM) is an important prognostic factor in patients with cervical cancer.2–4 The assessment of LNM before treatment is essential to guide and tailor the treatment.5,6 Themorphological examination of lymph nodes via medical images is commonly used for diagnosing LNM. However, it depends mainly on radiologists’ experience and has relatively low accuracy. Thus, we collected a multi-center dataset and developed a deep learning-based nomogram (DLN) to improve the accuracy of LNM diagnosis in cervical cancer. In total, 1123 cervical cancer patients with computed tomography (CT) examination were enrolled from 13 centers in our study (Table S1 and Supplementary A1). As shown in Supplementary A2 and Figure S1, we divided these patients into four cohorts: training cohort, validation cohort, external testing cohort 1, and external testing cohort 2. Detailed information on the four cohorts is presented in Table S2. The clinical characteristics included age, gravidity, histological type, FIGO stage, etc. Moreover, two experienced gynecologists, who were blinded to the pathological report, were invited to diagnose the status of LNM together using only CT images. Additionally, a follow-up cohort including 148 patients from one center was used for survival analysis. The workflow of this study is described in Figure 1, including region of interest (ROI) segmentation, data preprocessing (Supplementary A3), model construction, and model evaluation (Supplementary A4). We invited experienced gynecologists to segment ROIs in normalized CT images. Before model construction, data augmentations, including flipping, rotating, and random cropping, were used to generate new training samples to avoid overfitting. Oversampling methods were used to balance the ratio of LNM-positive patients and LNM-negative

patients in the training cohort. Three state-of-the-art deep learning methods, including ResNet18, 7 ResNet50, 7 and SE-Net, 8 were used to construct three candidate models (Supplementary A5). As shown in Table S3, ResNet18 showed the best performance in the validation cohort, and thus it was selected to build the final deep learning signature (Sig_DL). As shown in Supplementary A6, a total of 1407 handcrafted radiomic features were extracted, and three key radiomic features were selected via a series of feature selection methods and integrated them into a radiomic signature (Sig_radiomic). 9-10 As shown in Table 1 and Figure S2, the AUCs of Sig_DL performed better than Sig_radiomic in all the cohorts.
Additionally, univariate analysis was used to screen for significant clinical features. We noticed that the FIGO stage was significantly associated with LNM (P < 0.01). After multivariable logistic regression, we selected the FIGO stage and age as key clinical features and used them to construct a clinical signature (Sig_clin). The area under the receiver operating characteristic curve (AUCs) of Sig_clin reached 0.678 and 0.597 in training and validation cohorts, respectively.
Finally, we integrated Sig_DL, diagnoses of gynecologists, and all significant clinical features into a DLN via multivariate linear regress analysis (Table S4 and Figure 2A). Compared with other models, DLN had the best predictive ability ( Figure S3), with AUCs of 0.867, 0.807, 0.781, and 0.804 in the training cohort, validation cohort, external testing cohort1 and external testing cohort2 ( Figure 2B-E). As shown in Table 1, the accuracy also indicated the good performance of DLN in these four cohorts.
Meanwhile, the decision curves showed that the patients could benefit more from DLN than both Sig_DL and Sig_clin ( Figure 2F). As shown in Figure 2G, the calibration curves demonstrated that the DLN had good consistency with the gold standard of LNM.   It is worth noting that the diagnoses of the gynecologists had high specificity but low sensitivity in our cohorts. Therefore, we modified the cutoff value so that DLN could have the same specificity as the gynecologists' diagnoses. Then, we found that DLN had better accuracy and sensitiv-ity than the gynecologists (Table S5). The Venn diagrams also showed that DLN had more true positive cases than the gynecologists ( Figure S4). Four typical cases are shown in Figure 3, which indicates that DLN could help the clinician reduce the risk of misdiagnosis. Subgroup analysis was performed on the data of the enrolled patients, including their clinical characteristics, the CT manufacturers, and the centers. As shown in Figure S5A-F, the subgroup analysis indicates that the DLN was not affected by age, times of pregnancy, human papillomavirus (HPV) testing result, and histological type. Especially, we selected 614 cervical cancer patients for human papillomavirus (HPV) testing. Subgroup analysis revealed that our DLN showed good performance in both HPV-positive subgroup and HPVnegative subgroup ( Figure S5G-H). Our model also was minimally affected by the CT manufacturers and centers ( Figure S6A,B).
Besides, 148 cervical cancer patients with follow-up from Center 2 were used for exploring the association between DLN score and overall survival (OS) using Kaplan-Meier curves (Supplementary A7). We divided them into low-risk and high-risk groups using the mean value of DLN score as a cutoff. As shown in Figure 2H, we found that the highrisk group exhibited shorter OS (log-rank test: P = 0.0012). Furthermore, we stratified patients via the FIGO stage for comparison, however, the FIGO stage showed no sig-  Figure S7). Hence, DLN could serve as a significant prognostic factor for cervical cancer.
In conclusion, we developed a deep learning model for the preoperative prediction of LNM in cervical cancer and validated it in a large-scale and multicenter dataset. The performance of DLN surpassed the diagnosis of experienced gynecologists. Therefore, DLN can serve as a non-invasive tool for LNM determination and thus assist treatment decision-making.